Picture for Ziyue Wang

Ziyue Wang

Evaluating Frontier Models for Stealth and Situational Awareness

Add code
May 02, 2025
Viaarxiv icon

CoSpace: Benchmarking Continuous Space Perception Ability for Vision-Language Models

Add code
Mar 18, 2025
Viaarxiv icon

BEVDiffLoc: End-to-End LiDAR Global Localization in BEV View based on Diffusion Model

Add code
Mar 14, 2025
Viaarxiv icon

How Do Multimodal Large Language Models Handle Complex Multimodal Reasoning? Placing Them in An Extensible Escape Game

Add code
Mar 13, 2025
Viaarxiv icon

SurgRAW: Multi-Agent Workflow with Chain-of-Thought Reasoning for Surgical Intelligence

Add code
Mar 13, 2025
Viaarxiv icon

EgoLife: Towards Egocentric Life Assistant

Add code
Mar 05, 2025
Viaarxiv icon

DongbaMIE: A Multimodal Information Extraction Dataset for Evaluating Semantic Understanding of Dongba Pictograms

Add code
Mar 05, 2025
Viaarxiv icon

Perspective Transition of Large Language Models for Solving Subjective Tasks

Add code
Jan 16, 2025
Viaarxiv icon

Covariance-Based Device Activity Detection with Massive MIMO for Near-Field Correlated Channels

Add code
Nov 08, 2024
Figure 1 for Covariance-Based Device Activity Detection with Massive MIMO for Near-Field Correlated Channels
Figure 2 for Covariance-Based Device Activity Detection with Massive MIMO for Near-Field Correlated Channels
Figure 3 for Covariance-Based Device Activity Detection with Massive MIMO for Near-Field Correlated Channels
Figure 4 for Covariance-Based Device Activity Detection with Massive MIMO for Near-Field Correlated Channels
Viaarxiv icon

Catastrophic Cyber Capabilities Benchmark (3CB): Robustly Evaluating LLM Agent Cyber Offense Capabilities

Add code
Oct 10, 2024
Figure 1 for Catastrophic Cyber Capabilities Benchmark (3CB): Robustly Evaluating LLM Agent Cyber Offense Capabilities
Figure 2 for Catastrophic Cyber Capabilities Benchmark (3CB): Robustly Evaluating LLM Agent Cyber Offense Capabilities
Figure 3 for Catastrophic Cyber Capabilities Benchmark (3CB): Robustly Evaluating LLM Agent Cyber Offense Capabilities
Figure 4 for Catastrophic Cyber Capabilities Benchmark (3CB): Robustly Evaluating LLM Agent Cyber Offense Capabilities
Viaarxiv icon